Overview

Dataset statistics

Number of variables4
Number of observations1000209
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory30.5 MiB
Average record size in memory32.0 B

Variable types

Numeric3
Categorical1

Alerts

UserID is highly correlated with TimestampHigh correlation
Timestamp is highly correlated with UserIDHigh correlation

Reproduction

Analysis started2022-07-14 12:45:10.035761
Analysis finished2022-07-14 13:47:24.081674
Duration1 hour, 2 minutes and 14.05 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

UserID
Real number (ℝ≥0)

HIGH CORRELATION

Distinct6040
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3024.512348
Minimum1
Maximum6040
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.6 MiB
2022-07-14T09:47:24.142423image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile331
Q11506
median3070
Q34476
95-th percentile5740
Maximum6040
Range6039
Interquartile range (IQR)2970

Descriptive statistics

Standard deviation1728.412695
Coefficient of variation (CV)0.5714682223
Kurtosis-1.20099506
Mean3024.512348
Median Absolute Deviation (MAD)1465
Skewness0.005734559099
Sum3025144471
Variance2987410.444
MonotonicityIncreasing
2022-07-14T09:47:24.197617image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
41692314
 
0.2%
16801850
 
0.2%
42771743
 
0.2%
19411595
 
0.2%
11811521
 
0.2%
8891518
 
0.2%
36181344
 
0.1%
20631323
 
0.1%
11501302
 
0.1%
10151286
 
0.1%
Other values (6030)984413
98.4%
ValueCountFrequency (%)
153
 
< 0.1%
2129
 
< 0.1%
351
 
< 0.1%
421
 
< 0.1%
5198
< 0.1%
671
 
< 0.1%
731
 
< 0.1%
8139
 
< 0.1%
9106
 
< 0.1%
10401
< 0.1%
ValueCountFrequency (%)
6040341
 
< 0.1%
6039123
 
< 0.1%
603820
 
< 0.1%
6037202
 
< 0.1%
6036888
0.1%
6035280
 
< 0.1%
603421
 
< 0.1%
603360
 
< 0.1%
6032104
 
< 0.1%
603151
 
< 0.1%

MovieID
Real number (ℝ≥0)

Distinct3706
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1865.539898
Minimum1
Maximum3952
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.6 MiB
2022-07-14T09:47:24.268039image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile172
Q11030
median1835
Q32770
95-th percentile3675
Maximum3952
Range3951
Interquartile range (IQR)1740

Descriptive statistics

Standard deviation1096.040689
Coefficient of variation (CV)0.587519297
Kurtosis-1.111020976
Mean1865.539898
Median Absolute Deviation (MAD)884
Skewness0.09243570938
Sum1865929796
Variance1201305.193
MonotonicityNot monotonic
2022-07-14T09:47:24.322939image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
28583428
 
0.3%
2602991
 
0.3%
11962990
 
0.3%
12102883
 
0.3%
4802672
 
0.3%
20282653
 
0.3%
5892649
 
0.3%
25712590
 
0.3%
12702583
 
0.3%
5932578
 
0.3%
Other values (3696)972192
97.2%
ValueCountFrequency (%)
12077
0.2%
2701
 
0.1%
3478
 
< 0.1%
4170
 
< 0.1%
5296
 
< 0.1%
6940
0.1%
7458
 
< 0.1%
868
 
< 0.1%
9102
 
< 0.1%
10888
0.1%
ValueCountFrequency (%)
3952388
< 0.1%
395140
 
< 0.1%
395054
 
< 0.1%
3949304
 
< 0.1%
3948862
0.1%
394755
 
< 0.1%
3946100
 
< 0.1%
394543
 
< 0.1%
39449
 
< 0.1%
394396
 
< 0.1%

Rating
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
4
348971 
3
261197 
5
226310 
2
107557 
1
56174 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000209
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row5
2nd row3
3rd row3
4th row4
5th row5

Common Values

ValueCountFrequency (%)
4348971
34.9%
3261197
26.1%
5226310
22.6%
2107557
 
10.8%
156174
 
5.6%

Length

2022-07-14T09:47:24.370893image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-14T09:47:24.432529image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
4348971
34.9%
3261197
26.1%
5226310
22.6%
2107557
 
10.8%
156174
 
5.6%

Most occurring characters

ValueCountFrequency (%)
4348971
34.9%
3261197
26.1%
5226310
22.6%
2107557
 
10.8%
156174
 
5.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1000209
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4348971
34.9%
3261197
26.1%
5226310
22.6%
2107557
 
10.8%
156174
 
5.6%

Most occurring scripts

ValueCountFrequency (%)
Common1000209
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
4348971
34.9%
3261197
26.1%
5226310
22.6%
2107557
 
10.8%
156174
 
5.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1000209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4348971
34.9%
3261197
26.1%
5226310
22.6%
2107557
 
10.8%
156174
 
5.6%

Timestamp
Real number (ℝ≥0)

HIGH CORRELATION

Distinct458455
Distinct (%)45.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean972243695.4
Minimum956703932
Maximum1046454590
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.6 MiB
2022-07-14T09:47:24.574035image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum956703932
5-th percentile958704090.8
Q1965302637
median973018006
Q3975220939
95-th percentile993074152.6
Maximum1046454590
Range89750658
Interquartile range (IQR)9918302

Descriptive statistics

Standard deviation12152558.94
Coefficient of variation (CV)0.01249949884
Kurtosis10.94997785
Mean972243695.4
Median Absolute Deviation (MAD)5308808
Skewness2.765691163
Sum9.724468943 × 1014
Variance1.476846888 × 1014
MonotonicityNot monotonic
2022-07-14T09:47:24.622428image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
97552840230
 
< 0.1%
97544071228
 
< 0.1%
97552778128
 
< 0.1%
102558563527
 
< 0.1%
97552824327
 
< 0.1%
97528027626
 
< 0.1%
97552811526
 
< 0.1%
97528039025
 
< 0.1%
102503628825
 
< 0.1%
97469801524
 
< 0.1%
Other values (458445)999943
> 99.9%
ValueCountFrequency (%)
9567039321
 
< 0.1%
9567039542
 
< 0.1%
9567039772
 
< 0.1%
9567040565
< 0.1%
9567040811
 
< 0.1%
9567041913
< 0.1%
9567042191
 
< 0.1%
9567042573
< 0.1%
9567043051
 
< 0.1%
9567044481
 
< 0.1%
ValueCountFrequency (%)
10464545901
< 0.1%
10464545482
< 0.1%
10464544431
< 0.1%
10464543381
< 0.1%
10464543201
< 0.1%
10464542821
< 0.1%
10464542601
< 0.1%
10464447111
< 0.1%
10464379321
< 0.1%
10464378791
< 0.1%

Interactions

2022-07-14T09:03:16.946057image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-14T08:45:26.027040image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-14T08:54:19.031892image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-14T09:11:14.599433image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-14T08:45:56.258416image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-14T08:54:44.778295image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-14T09:18:55.938713image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-14T08:46:16.846696image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-14T08:55:01.447474image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-07-14T09:47:24.669309image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-07-14T09:47:22.387697image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-07-14T09:47:23.178824image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

UserIDMovieIDRatingTimestamp
0111935978300760
116613978302109
219143978301968
3134084978300275
4123555978824291
5111973978302268
6112875978302039
7128045978300719
815944978302268
919194978301368

Last rows

UserIDMovieIDRatingTimestamp
1000199604020225956716207
1000200604020285956704519
1000201604010804957717322
1000202604010894956704996
1000203604010903956715518
1000204604010911956716541
1000205604010945956704887
100020660405625956704746
1000207604010964956715648
1000208604010974956715569